-
Notifications
You must be signed in to change notification settings - Fork 43
Update WorkQueue.queue to be a reference type instead of an array. #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We pass WorkQueue.queue 'by reference' through inout parameters and across thread suspension points. We noticed a stale value issue in release builds when passing Array with inout, likely due to the copy-on-write mechanism. To avoid stale values after a thread resumes from suspension, switch to a dedicated reference type Queue.
private nonisolated(unsafe) var queue: [BackgroundWorkItem] | ||
// Queue needs to be a reference type because we pass it inout | ||
final class Queue: Sendable { | ||
internal nonisolated(unsafe) var queue: [BackgroundWorkItem] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect because this type isn't Sendable. It allows mutation without synchronousiarion.
furthermore, the reference gets passed inout.
We should also add a test that shows where precisely you need reference semantics and why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect because this type isn't Sendable. It allows mutation without synchronousiarion.
This type is exclusively used under a pthread_mutex_t. It is for that reason @unchecked Sendable
.
We should also add a test that shows where precisely you need reference semantics and why
I don't quite understand what do you mean. Could you suggest a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incorrect because this type isn't Sendable. It allows mutation without synchronousiarion.
This type is exclusively used under a pthread_mutex_t. It is for that reason @unchecked Sendable.
I get that this is intent but we should prove this to the compiler and not use @unchecked
, that's unsafe. We have helpers like NIOLockedValueBox, OSAllocatedUnfairLock and Mutex which can make this safe. And we should use those
I don't quite understand what do you mean. Could you suggest a test?
I assume that the reason for this pull request is that something didn't work as expected. The test should test that before it doesn't work as expected but now (with this change) it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this is intent but we should prove this to the compiler and not use @unchecked, that's unsafe. We have helpers like NIOLockedValueBox, OSAllocatedUnfairLock and Mutex which can make this safe. And we should use those
We can't use OSAllocatedUnfairLock
nor Mutex
directly here because we need access to the underlying pthread_mutex_t
to pass to pthread_cond_wait
. Because of this, I deliberately made WorkQueue
NOT a generic LockedBox
type. Instead, we use Mutex
wherever we can and only use WorkQueue
for very specific purposes. On the other hand, what's wrong with using @unchecked Sendable
this way? Isn't this the canonical use case for @unchecked Sendable
(when you explicitly use external synchronization instead of relying on Swift itself)?
I assume that the reason for this pull request is that something didn't work as expected. The test should test that before it doesn't work as expected but now (with this change) it does.
This change resolves issue #182. Currently, the run()
function hangs on release builds on Darwin, and it no longer hangs on release builds with this PR. To prevent future regressions, I have added swift test -c release
to the CI pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but if you build a general purpose work queue, then I'd suggest
- put it into its own file
- Make it actually Sendable (currently it is not and just disables compiler checking)
- Add some real tests for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this code, it looks like Queue
isn't actually Sendable like @weissi pointed out. The callsites however can use it safely as long as it is always guarded by a pthread_mutex_t.
Is my understanding correct? If so, then I tend to agree that marking this type Sendable
is misleading. This code would still work fine even if we remove Sendable
from Queue
since you already declared the var as nonisolated(unsafe)
below:
private nonisolated(unsafe) var queue: Queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My position is: Anything @unchecked
/unsafe
needs to be avoided. That's just switching off compiler checking.
A narrow exception is if you're creating new, reusable components (such as a generic queue) with an extensive test suite that cannot be used unsafely through their API, then that's fine.
For example, AsyncProcess doesn't contain @unchecked
s at all
linux_build_command: 'swift-format lint -s -r --configuration ./.swift-format . && swift test && swift test -c release && swift test --disable-default-traits' | ||
windows_swift_versions: '["6.1", "nightly-main"]' | ||
windows_build_command: | | ||
Invoke-Program swift test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add Invoke-Program swift test -c release
below this line so we have release coverage for Windows as well?
} | ||
|
||
func withLock<R>(_ body: (inout [BackgroundWorkItem]) throws -> R) rethrows -> R { | ||
func withLock<R>(_ body: (inout Queue) throws -> R) rethrows -> R { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of using inout anymore if it's a reference type?
#if canImport(WinSDK) | ||
SleepConditionVariableCS(self.waitCondition, mutex, INFINITE) | ||
#else | ||
pthread_cond_wait(self.waitCondition, mutex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the actual issue is that we sleep the thread here using the condition variable while still holding an exclusive reference to the queue, and resuming the thread later returns to this point and corrupts state.
I wonder if we can resolve this concern by doing something similar to what @weissi said and decoupling the work queue from the condition variable logic using an abstraction similar to what I proposed here: https://forums.swift.org/t/thoughts-on-a-conditionvariable-type/81123
I think that may solve the problem by ensuring we release ownership of the inout before suspending the thread, due to the way the API is structured -- you can't interact with the condition variable and suspend inside the user-provided closure. Though in your case I think you'd want a variant of my wait
method that lets you suspend prior to running block instead of after.
Then define WorkQueue as:
private final class WorkQueue: Sendable {
private let queue: ConditionVariable<[BackgroundItem]>
}
We pass WorkQueue.queue 'by reference' through inout parameters and across thread suspension points. We noticed a stale value issue in release builds when passing Array with inout, likely due to the copy-on-write mechanism. To avoid stale values after a thread resumes from suspension, switch to a dedicated reference type Queue.
Resolves: #182